40 research outputs found
Combining software cache partitioning and loop tiling for effective shared cache management
One of the biggest challenges in multicore platforms is shared cache management, especially for data dominant
applications. Two commonly used approaches for increasing shared cache utilization are cache partitioning
and loop tiling. However, state-of-the-art compilers lack of efficient cache partitioning and loop tiling
methods for two reasons. First, cache partitioning and loop tiling are strongly coupled together, thus addressing
them separately is simply not effective. Second, cache partitioning and loop tiling must be tailored
to the target shared cache architecture details and the memory characteristics of the co-running workloads.
To the best of our knowledge, this is the first time that a methodology provides i) a theoretical foundation
in the above mentioned cache management mechanisms and ii) a unified framework to orchestrate these
two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory
accesses by an order of magnitude keeping at the same time the number of arithmetic/addressing instructions
in a minimal level. We motivate this work by showcasing that cache partitioning, loop tiling, data
array layouts, shared cache architecture details (i.e., cache size and associativity) and the memory reuse
patterns of the executing tasks must be addressed together as one problem, when a (near)- optimal solution
is requested. To this end, we present a search space exploration analysis where our proposal is able to offer
a vast deduction in the required search space
Cache partitioning + loop tiling: A methodology for effective shared cache management
In this paper, we present a new methodology that provides i) a theoretical analysis of the two most commonly used approaches for effective shared cache management (i.e., cache partitioning and loop tiling) and ii) a unified framework to fine tuning those two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory accesses by one order of magnitude keeping at the same time the number of arithmetical/addressing instructions in a minimal level. We also present a search space exploration analysis where our proposal is able to offer a vast deduction in the required search space
The LPGPU2 Project: Low-Power Parallel Computing on GPUs : Extended Abstract
The LPGPU2 project is a 30-month-project (Innovation Action) funded by the European Union. Its overall goal is to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To achieve this overall goal, several key objectives need to be achieved. First, several applications (use cases) need to be developed for or ported to low-power GPUs. Thereafter, these applications need to be optimized using the tooling framework. In addition, power measurement devices and power models need to be developed that are 10x more accurate than the state of the art. The project consortium actively promotes open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first half of the project, and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU
Enabling GPU software developers to optimize their applications – The LPGPU2approach
Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU 2 project is an EU-funded, Innovation Action, 30-month-project targeting to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To this end, the project follows a holistic approach. First, several applications (use cases) are being developed for or ported to low-power GPUs. These applications will be optimized using the tooling framework in the last phase of the project. In addition, power measurement devices and power models are devised that are 10× more accurate than the state of the art. The ultimate goal of the project is to promote open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first phase of the project (till month 18) and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU